Model Selection

16kHz sampling rate

# 16kHz sampling rate

Vits Icelandic Rosa Female Monospeaker

This is an Icelandic text-to-speech model fine-tuned based on facebook/mms-tts-isl, trained using the Talrómur dataset, specializing in female voice synthesis.

Speech Synthesis

Transformers Other

Whisper Small Japanese

This model is a Japanese speech recognition model fine-tuned based on openai/whisper-small, supporting Japanese speech-to-text tasks.

Speech Recognition

Transformers Japanese

Wav2vec2 Large Xlsr 53 Japanese

Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input

Speech Recognition

Transformers Japanese

Exp W2v2t Fa Hubert S801

A Persian automatic speech recognition model fine-tuned from facebook/hubert-large-ll60k, trained using the Common Voice 7.0 Persian dataset.

Speech Recognition

Transformers Other

Exp W2v2t Sv Se Vp Nl S842

This is a Swedish automatic speech recognition model fine-tuned based on the facebook/wav2vec2-large-nl-voxpopuli model, trained using the Common Voice 7.0 (sv-SE) dataset.

Speech Recognition

Exp W2v2t Sv Se Wavlm S42

A Swedish automatic speech recognition model fine-tuned from microsoft/wavlm-large, suitable for 16kHz sampled audio input.

Speech Recognition

Exp W2v2t Fr Vp Fr S438

A French automatic speech recognition model fine-tuned based on the facebook/wav2vec2-large-fr-voxpopuli model, trained using the Common Voice 7.0 French dataset.

Speech Recognition

Transformers French

Exp W2v2t Fr Unispeech S42

A speech recognition model fine-tuned using the Common Voice 7.0 (French) dataset, based on the microsoft/unispeech-large-1500h-cv model

Speech Recognition

Transformers French

Exp W2v2t It No Pretraining S842

Fine-tuned from a randomly initialized wav2vec2 model for Italian speech recognition tasks, trained on the training split of Common Voice 7.0 (Italian).

Speech Recognition

Transformers Other

Exp W2v2t It Xlsr 53 S387

An Italian automatic speech recognition model fine-tuned based on the facebook/wav2vec2-large-xlsr-53 model, trained using the Common Voice 7.0 Italian dataset.

Speech Recognition

Transformers Other

Exp W2v2t It Vp 100k S449

An Italian automatic speech recognition model fine-tuned from the facebook/wav2vec2-large-100k-voxpopuli model, trained using the Common Voice 7.0 Italian dataset.

Speech Recognition

Transformers Other

Exp W2v2t It Wav2vec2 S609

An Italian automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-lv60, trained using the Common Voice 7.0 Italian dataset.

Speech Recognition

Transformers Other

Exp W2v2t Th Hubert S533

A Thai speech recognition model fine-tuned from facebook/hubert-large-ll60k, trained on data from Common Voice 7.0

Speech Recognition

Transformers Other

Exp W2v2t Th Wav2vec2 S664

A Thai speech recognition model fine-tuned based on facebook/wav2vec2-large-lv60, trained using the Common Voice 7.0 dataset

Speech Recognition

Transformers Other

Exp W2v2t En Vp Nl S281

An English speech recognition model fine-tuned based on facebook/wav2vec2-large-nl-voxpopuli, trained using the Common Voice 7.0 training set.

Speech Recognition

Transformers English

Exp W2v2t En No Pretraining S289

This is a model designed for English speech recognition tasks, based on a randomly initialized wav2vec2 architecture and fine-tuned using the Common Voice 7.0 dataset.

Speech Recognition

Transformers English

Sharif Wav2vec2

A fine-tuned version of Sharif Wav2vec2 for Persian language, trained on Common Voice Persian samples, supporting automatic speech recognition tasks.

Speech Recognition

Transformers Other

Data2vec Audio Large 960h

Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This large audio model is pre-trained and fine-tuned on 960 hours of LibriSpeech data, specifically optimized for automatic speech recognition tasks.

Speech Recognition

Transformers English

Wav2vec2 Base Da Ft Nst

Danish speech recognition model fine-tuned on the NST dataset, supporting 16kHz sampled audio input

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 English

An English speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained on the Common Voice 6.1 dataset

Speech Recognition English

Wav2vec2 Large Xlsr Hindi

A Hindi automatic speech recognition model fine-tuned on low-resource Indian language datasets based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Slovenian

This is a Slovenian automatic speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 model, trained on the Common Voice dataset with a word error rate of 36.04%.

Speech Recognition Other

Wav2vec2 Large Xlsr Kazakh

This is a Kazakh automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Kazakh speech corpus v1.1 with a test WER of 19.65%.

Speech Recognition Other

Wav2vec2 Base Vietnamese

Vietnamese speech recognition model based on Wav2Vec2 architecture, fine-tuned on VSLP dataset, supports 16kHz sampled speech input

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Finnish

A Finnish automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Eu

A Basque automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, achieving a 15.34% word error rate (WER) on the Common Voice Basque test set.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Hindi

Hindi speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Turkish Artificial

This is a Turkish speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained using artificial Common Voice dataset.

Speech Recognition Other

Wav2vec2 Xls R 1b Italian

This is an Italian automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple Italian datasets

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Javanese

A Javanese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on high-quality Javanese TTS data from OpenSLR.

Speech Recognition Other

Wav2vec2 Large Xlsr Sundanese

A Sundanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on high-quality TTS data from OpenSLR

Speech Recognition Other

Xlsr En Punctuation

Fine-tuned automatic speech recognition model based on facebook/wav2vec2-large-xlsr-53 on the English Common Voice dataset, supporting punctuation prediction

Speech Recognition English

Wav2vec2 Xls R 1b Polish

This is a Polish automatic speech recognition (ASR) model fine-tuned based on the XLS-R 1-billion parameter model, trained on datasets such as Common Voice 8.0, supporting 16kHz sampling rate audio input.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Turkish

A Turkish speech recognition model fine-tuned on the Common Voice dataset based on Facebook's wav2vec2-large-xlsr-53 model

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Finnish

A Finnish automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input

Speech Recognition

Transformers Other

Wav2vec2 Xlsr Khmer

A Khmer speech recognition model fine-tuned on the facebook/wav2vec2-large-xlsr-53 model, achieving a WER of 24.96% on the OpenSLR Khmer dataset.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Frisian

An automatic speech recognition model fine-tuned for Frisian using the Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.

Speech Recognition

Wav2vec2 Large Xlsr Indonesian

This is an automatic speech recognition model fine-tuned on the Indonesian common voice dataset based on facebook/wav2vec2-large-xlsr-53, supporting Indonesian speech recognition.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Spanish

This is an automatic speech recognition (ASR) model fine-tuned on the Spanish Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.

Speech Recognition Spanish

Wav2vec2 Large Xlsr 53 Romanian

An automatic speech recognition model fine-tuned on the Common Voice Romanian dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase